GPGPU-Accelerated Instruction Accurate and Fast Simulation of Thousand-core Platforms
نویسندگان
چکیده
Future architectures will feature hundreds to thousands of simple processors and on-chip memories connected through a network-on-chip. Architectural simulators will remain primary tools for design space exploration, performance (and power) evaluation of these massively parallel architectures. However, architectural simulation performance is a serious concern, as virtual platforms and simulation technology are not able to tackle the complexity of 1,000-core future scenarios. The main contribution of this paper is the development of a simulator for 1,000-core processors which exploits the enormous parallel processing capability of low-cost and widely available General Purpose Graphic Processing Units (GPGPU). We demonstrate our GPGPU simulator on a target architecture composed by several cores (i.e. ARM ISA based), with instruction and data caches, connected trough a Networkon-Chip (NoC). Our experiments confirm the feasibility of our approach. Currently, our ongoing work is focused on developing the power models within the simulation engine.
منابع مشابه
GPGPU Accelerated Sparse Linear Solver for Fast Simulation of On-Chip Coupled Problems
Continued device scaling into the nanometer region has given rise to new effects that previously had negligible impact but now present greater challenges to designing successful mixed-signal silicon. Design efforts are further exacerbated by unprecedented computational resource requirements for accurate design simulation and verification. This paper presents a general purpose graphic processing...
متن کاملPerformance Analysis and Tuning for General Purpose Graphics Processing Units (GPGPU)
General-purpose graphics processing units (GPGPU) have emerged as an important class of shared memory parallel processing architectures, with widespread deployment in every computer class from high-end supercomputers to embedded mobile platforms. Relative to more traditional multicore systems of today, GPGPUs have distinctly higher degrees of hardware multithreading (hundreds of hardware thread...
متن کاملA Novel Synchronization Technique for Fast and Accurate Multi-core Instruction-set Simulation
This paper proposes a synchronization technique for fast and accurate Multi-Core Instruction-Set Simulation (MCISS). Traditionally, a lock-step approach, which synchronizes every cycle, is commonly used to achieve accurate simulation results of MCISS. However, this approach results in immense overhead and low simulation speed. Rather than synchronizing every cycle, our approach synchronizes the...
متن کاملA Novel Timing Synchronization Method for Fast and Accurate Multi-core Instruction-set Simulation
This paper proposes a timing synchronization method for fast and accurate Multi-Core Instruction-Set Simulation (MCISS). In order to achieve accurate simulation results of MCISS, a lock-step approach, which synchronizes every cycle, is commonly used. However, this approach introduces immense overhead and lowers the simulation speed. Instead of synchronizing every cycle, our approach synchronize...
متن کاملCycle-Accurate 64-Core FPGA-Based Hybrid Simulator
Nowadays, computer architecture researches mainly focus on the multicore hardware and software design. As compared with the traditional uniprocessor counterpart, the system complexity of multicore simulators is dramatically augmented, which is spurred by the increase in core number. Full-system fidelity, fast simulation speed, and cycle-level accuracy are the essential requirements of the advan...
متن کامل